Elfen Scheduling: Fine-Grain Principled Borrowing from Latency-Critical Workloads Using Simultaneous Multithreading

نویسندگان

Xi Yang

Stephen M. Blackburn

Kathryn S. McKinley

چکیده

Web services from search to games to stock trading impose strict Service Level Objectives (SLOs) on tail latency. Meeting these objectives is challenging because the computational demand of each request is highly variable and load is bursty. Consequently, many servers run at low utilization (10 to 45%); turn off simultaneous multithreading (SMT); and execute only a single service — wasting hardware, energy, and money. Although co-running batch jobs with latency critical requests to utilize multiple SMT hardware contexts (lanes) is appealing, unmitigated sharing of core resources induces non-linear effects on tail latency and SLO violations. We introduce principled borrowing to control SMT hardware execution in which batch threads borrow core resources. A batch thread executes in a reserved batch SMT lane when no latency-critical thread is executing in the partner request lane. We instrument batch threads to quickly detect execution in the request lane, step out of the way, and promptly return the borrowed resources. We introduce the nanonap system call to stop the batch thread’s execution without yielding its lane to the OS scheduler, ensuring that requests have exclusive use of the core’s resources. We evaluate our approach for colocating batch workloads with latency-critical requests using the Apache Lucene search engine. A conservative policy that executes batch threads only when request lane is idle improves utilization between 90% and 25% on one core depending on load, without compromising request SLOs. Our approach is straightforward, robust, and unobtrusive, opening the way to substantially improved resource utilization in datacenters running latency-critical workloads.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chapter 7 Workloads for Programmable Network Interfaces

Network equipment vendors are increasingly incorporating a programmable microprocessor on network interfaces to meet the performance and functionality requirements of present and emerging applications in parallel with market demand. This study identifies some properties of programmable network interface (PNI) workloads and their execution characteristics on modern high-performance microprocesso...

متن کامل

Improving Latency Tolerance of Network Processors Through Simultaneous Multithreading

Existing multithreaded network processors architecture with multiple processing engines (PEs), aims at taking advantage of blocked multithreading technique which executes instructions of different user-defined threads in the same PE pipeline, in explicit and interleave way. Multiple PEs, each of which is a multithreaded processor core, process several packets in parallel to hide long memory acc...

متن کامل

Asynchrony in Parallel Computing: From Dataflow to Multithreading

The paper presents an overview of the parallel computingmodels, architectures, and research projects that are based on asynchronous instruction scheduling. It starts with pure data ow computing models and presents an historical development of several ideas (i.e. single-token-per-arc data ow, tagged-token data ow, explicit token store, threaded data ow, large-grain data ow, RISC data ow, cycle-b...

متن کامل

Improving Latency Tolerance of Multithreading through Decoupling

ÐThe increasing hardware complexity of dynamically scheduled superscalar processors may compromise the scalability of this organization to make an efficient use of future increases in transistor budget. SMT processors, designed over a superscalar core, are therefore directly concerned by this problem. This work presents and evaluates a novel processor microarchitecture which combines two paradi...

متن کامل

Simultaneous Multithreading: Maximizing On-Chip Parallelism - Computer Architecture, 1995. Proceedings., 22nd Annual International Symposium on

This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar's multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with altemative organizations: a wide superscalar, a fine-grain multithreaded processor, and single-chip, multiple-issue multiprocessing ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Elfen Scheduling: Fine-Grain Principled Borrowing from Latency-Critical Workloads Using Simultaneous Multithreading

نویسندگان

چکیده

منابع مشابه

Chapter 7 Workloads for Programmable Network Interfaces

Improving Latency Tolerance of Network Processors Through Simultaneous Multithreading

Asynchrony in Parallel Computing: From Dataflow to Multithreading

Improving Latency Tolerance of Multithreading through Decoupling

Simultaneous Multithreading: Maximizing On-Chip Parallelism - Computer Architecture, 1995. Proceedings., 22nd Annual International Symposium on

عنوان ژورنال:

اشتراک گذاری